17 research outputs found
Spatial features of reverberant speech: estimation and application to recognition and diarization
Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed. An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced.
Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.Open Acces
Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics
Keyword Spotting (KWS) models on embedded devices should adapt fast to new
user-defined words without forgetting previous ones. Embedded devices have
limited storage and computational resources, thus, they cannot save samples or
update large models. We consider the setup of embedded online continual
learning (EOCL), where KWS models with frozen backbone are trained to
incrementally recognize new words from a non-repeated stream of samples, seen
one at a time. To this end, we propose Temporal Aware Pooling (TAP) which
constructs an enriched feature space computing high-order moments of speech
features extracted by a pre-trained backbone. Our method, TAP-SLDA, updates a
Gaussian model for each class on the enriched feature space to effectively use
audio representations. In experimental analyses, TAP-SLDA outperforms
competitors on several setups, backbones, and baselines, bringing a relative
average gain of 11.3% on the GSC dataset.Comment: INTERSPEECH 202
Investigaciones e investigadores de la UAM
Continuamos en este número de la revista con la sección: Investigaciones en la Universidad Autónoma de Madrid, con la que se pretende dar a conocer investigaciones relacionadas con diversas disciplinas cientÃficas que se han desarrollado o se están llevando a cabo en la UAM, con el fin de describir de una forma simple y didáctica tales trabajos, y con ello los contenidos de diversas ramas del conocimiento, y cumplir asà con la finalidad inherente a esta revista de divulgar la ciencia asà como de contribuir al surgimiento de posibles ideas o iniciativas para posteriores investigaciones por parte de los jóvenes cientÃficos, o de estudiantes universitarios de grado o posgrado que están en disposición y voluntad de llegar a serlo. Se recogen a continuación algunos relatos de investigaciones realizadas por varios profesores de la UAM, los cuales se recogieron en una publicación conmemorativa del cumplimiento de los cuarenta años por parte de esta universidad y relativos a las siguientes disciplinas: Biomedicina, Historia Contemporánea, QuÃmica y alimentación, Matemáticas y BioquÃmic
ABCB1 C3435T, G2677T/A and C1236T variants have no effect in eslicarbazepine pharmacokinetics
Eslicarbazepine acetate is a third-generation anti-epileptic prodrug quickly and extensively transformed to eslicarbazepine after oral administration. Reduction in seizure frequency in patients managed with eslicarbazepine is only partial in the majority of patients and many of them suffer considerable ADRs that require a change of treatment. The P-glycoprotein, encoded by the ABCB1 gene, is expressed throughout the body and can impact the pharmacokinetics of several drugs. In terms of epilepsy treatment, this transporter was linked to drug-resistant epilepsy, as it conditions drug access into the brain due to its expression at the blood-brain barrier. Therefore, we aimed to investigate the impact of three ABCB1 common polymorphisms (i.e., C3435T, or rs1045642, G2677A or rs2032582 and C1236T or rs1128503) in the pharmacokinetics and safety of eslicarbazepine. For this purpose, 22 healthy volunteers participating in a bioequivalence clinical trial were recruited. No significant relationship was observed between sex, race and ABCB1 polymorphism and eslicarbazepine pharmacokinetic variability. In contrast, ABCB1 C1236T C/C diplotype was significantly related to the occurrence of ADRs: one volunteer with this genotype suffered dizziness, somnolence and hand paresthesia, while no other volunteer suffered any of these ADRs (p < 0.045). To the best of our knowledge, this is the first study published to date evaluating eslicarbazepine pharmacogenetics. Further studies with large sample sizes are needed to compare the results obtained here.G. Villapalos-GarcÃa is co-financed by Instituto de Salud Carlos III (ISCIII) and the European Social Fund (PFIS predoctoral grant, number FI20/00090). M. Navares-Gómez is financed by the ICI20/00131 grant,
Acción Estratégica en Salud 2017–2020, ISCIII
Holmium:YAG laser ablation of upper urinary tract transitional cell carcinoma with new Olympus digital flexible ureteroscope
Upper urinary tract transitional (UUTT) cell carcinoma is a relatively uncommon urologic tumor. The traditional treatment approach for them is radical nephroureterectomy. However, in recent years, less-invasive treatments, including different nephron-sparing procedures, have become increasingly popular. We report a case of laser ablation of UUTT cell carcinoma using new Olympus digital flexible ureteroscope (URF-V)
Reverberant speech recognition exploiting clarity index estimation
We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C 50). Our best performing method includes the estimated value of C 50 in the ASR feature vector and also uses C 50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C 50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.status: publishe
Analysis of prediction intervals for non-intrusive estimation of speech clarity index
We present an analysis of prediction intervals for a non-intrusive method to estimate the clarity index (C50). The method employed to estimate C50 is a data driven approach that extracts multiple features from a reverberant speech signal which are then used to train a bidirectional long-short term memory model which maps the feature space into the target C50 value. The prediction intervals are derived from the standard deviation of the per-frame C50 estimates. This approach was shown to provide a coverage probability of 80%, i.e. 80% of times the ground truth lies between the estimated intervals, where the interval bounds are computed by using 5.6 times the standard deviation of the per-frame estimates. This accuracy is shown to be consistent with other noisy reverberant environments.status: publishe